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ABSTRACT 

Biomolecular channels play important roles in many 
biological systems, e.g. enzymes, ribosomes and ion 
channels. This article introduces a web-based inter- 
active MOLEon/fne 2.0 application for the analysis of 
access/egress paths to interior molecular voids. 
MOLEon/zne 2.0 enables platform-independent, 
easy-to-use and interactive analyses of (bio)macro- 
molecular channels, tunnels and pores. Results are 
presented in a clear manner, making their interpret- 
ation easy. For each channel, MOLEon/zne displays a 
3D graphical representation of the channel, its 
profile accompanied by a list of lining residues and 
also its basic physicochemical properties. The users 
can tune advanced parameters when performing a 
channel search to direct the search according to 
their needs. The MOLEon//ne 2.0 application is 
freely available via the Internet at http://ncbr.muni. 
cz/mole or http://mole.upol.cz. 

INTRODUCTION 

Tunnels or channels, pores, cavities and voids are struc- 
tural features of many biomolecular systems possessing 
significant biological functions. The following are just a 
few of the numerous examples where channels play an 
important biological function; highly selective ion 
channels (1-6), channels and pathways in photosystem II 
(7,8), ribosomal polypeptide exit channel (9), substrate- 
determining active site access channels of Cytochrome 
P450 (10-15) and haloalkane dehalogenases, where muta- 
genesis of substrate access channels alters enzyme activity 



(16,17). As an empty interior space is a key feature of this 
type of biomolecule, a considerable amount of attention 
has been paid to analyzing its properties (18-20). Many 
algorithms and software tools have been developed to 
identify these structures in (bio)macromolecules, including 
grid (16,21-24), space filhng (25) and slice methods 
(26,27), and Voronoi diagrams (18,28-30). 

CAVER (16), MOLE (28), MolAxis (29,30) and 
PROPORES (31) are all dedicated software tools for 
analyzing molecular channels. CAVER 1.0 (16) involves 
grid nodes evaluated by a cost function based on the 
square of reciprocal distance to the closest atom, and 
then employs the Dijskstra's algorithm (32) to select the 
shortest and most geometrically convenient pathway from 
an internal to external point. In 2005, CAVER 1.0 repre- 
sented a considerable advance in the automatic detection 
of channels. However, its algorithm suffered from several 
limitations, which have since been overcame in the later 
issued software named MOLE (28). The core of the 
MOLE 1.0 algorithm again utilizes the Dijkstra's path 
search algorithm, which is applied to a Voronoi mesh 
(33,34). A later pubhshed software, MolAxis (29,30), 
uses an algorithm similar to MOLE. Another recent 
tool, PROPORES (31), searches for channels in a 
similar fashion to CAVER, but it also rotates side 
chains along the channel so that they adopt sterically 
allowed positions in order to enlarge possible bottlenecks. 

This article presents the web-based MOLEonline appli- 
cation (ver. 2.0), which offers a user-friendly, interactive 
and platform-independent environment for the setup, 
manipulation, analysis and printing of channel search 
results. Besides structural features, MOLEonline also 
allows analysis of the basic physicochemical properties 
of (bio)macroniolecular channels, tunnels and pores. 
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DESCRIPTION OF THE TOOL 

The procedure in using the MOLEo«//«e 2.0 appHcation 
involves three steps: (i) setup; (ii) calculation; and (iii) 
results visualization and manipulation. 

Setup 

The structure to be analyzed can be either taken from the 
Protein data bank (PDB) server (35) or uploaded in the 
PDB format. Once the structure is uploaded, it is 
visualized by the Jmol Java plugin (36). In addition, the 
sequence corresponding to the structure can be explored in 
an interactive window, enabhng selection of the starting 
residues (Figure 1). MOLEon/wie enables the user to define 
the starting point based on the center of mass of selected 
residues, either by selection from the sequence or 
manually by selection of x, y and z coordinates. In the 
case of known and annotated enzyme structures, 
MOLEo«//«e allows the use of information on the active 
site residues from the catalytic site atlas (CSA) database 
(37) and use of biological unit instead of asymmetric one. 
The last possibility is to use 'Automatic starting points'. 
These points are the deepest points in the protein's cavities 



and using them can provide primary information on the 
layout of channels inside a protein. 

Calculation 

After setup, the calculation of channels is executed by the 
MOLE 2.0 software (D. Sehnal et al., unpublished data) 
running on a server. All setup and structure information 
are deposited on the server in a unique directory (which is 
translated as a unique URL for a web browser). After the 
MOLE 2.0 calculation, further analyses of the channel 
results are carried out, providing comprehensive and 
easily interpretable information about the channels (see 
below). 

The channel computation in the MOLE 2.0 software is 
performed in several steps as foUows: 

(1) the Voronoi diagram is computed; 

(2) the Voronoi diagram is refined and split into several 
smaller parts called cavity diagrams, representing all 
the empty space in the molecule; 

(3) starting and ending points are identified in each of 
the cavity diagrams; and 

(4) Dijkstra's shortest path algorithm is used to find the 
channels between the pairs of starting and ending points. 




Jmol! 



; 48 53 58 63 68 73 78 83 88 



Sequence Applet 



Figure 1. MOLEonline 2.0 setup webpage for channel calculation. Each job is assigned a job ID to allow easy access to the results. Setup starts with 
the selection of a PDB file (here ITQN) either from the PDB database or uploaded as a user file. The tunnel starting point can be selected 
automatically (inside cavities detected by MOLE 2.0 algorithm) or manually, by using CSA (37), via selection through the interactive sequence 
applet on the bottom of the page or by specifying of .t, y, i coordinates in advanced settings. Advanced settings also enable the adjustment of 
parameters determining the tunnel searching algorithm. All parameters are set in Angstroms (for details see the text). 
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The Voronoi diagram divides a metric space according to the 
distances between discrete sets of specified objects. In our 
case, the objects are atomic centers with van der Waals 
(vdW) radii assigned according to the pann99 force field 
(38). Molecular surface is calculated as a probe accessible 
surface with a defined probe radius (default 3 A). A vertex of 
the Voronoi diagram is removed if a sphere with interior 
threshold (default 1.25 A) radius cannot pass through any 
of the tetrahedron sides. The Voronoi diagram is split into 
several smaller cavity diagrams that are analyzed for suitable 
channel start and end points between vertices of the cavity. 
Starting points are initially estimated by considering a 
centroid from all the corresponding atomic centers of the 
residues selected by the user. Starting points are then 
selected within a specified origin radius (default 3 A) as the 
closest vertex for each cavity. End points are selected for 
each cavity diagram as the tetrahedra on the boundary 
vertices. Channel exits can only be assigned to those 
tetrahedra that are separated by a distance equivalent to 
the surface cover radius (default 10 A). Finally, when the 
set of start and end points has been identified for each 
cavity, the Dijkstra's shortest path algorithm is used to 
find the channels between all pairs of start and end points. 
The edge weight function used in the algorithm takes into 
account the distance to the surface of the closest vdW sphere 
and the edge length. The channel centerline is represented as 
a 3D natural spline. Depending on the density of computed 
exits, the algorithm may find duplicate channels. Therefore, 
in the final post-processing step, if two channels are nearly 
identical, the longer one is removed. A detailed explanation 
of the algorithm (also as a scheme) and parameters of the 
calculation can be found on the MOLEo«/;«e webpage 
(e.g. http://mole.upol.cz/documentation/). MOLE 2.0 
outperforms the original MOLE (28) algorithm in many 
aspects. For instance, it is quicker due to the division of 
the internal space within the macromolecule to separate 
subcavities. There is no need to determine the number of 
channels prior calculation. MOLE 2.0 enables automatic 
selection of the starting points and calculation of some 
basic physicocheniical properties of the channel-lining 
residues. 

Results visualization and analysis 

Profiles of the channels found are presented in three ways: 
(i) plots of channel radii against length (visualized using 
gnuplot — http://www.gnuplot.info — as PNG images); (ii) 
an interactive table summarizing the set of lining residues 
and physicocheniical properties; and (iii) the channel 
isosurface along its centerhne, which is visualized in the 
Jmol plugin (36). 

MOLEo«//«e 2.0 also allows calculation of basic 
physicochemical properties along the unique channel- 
lining amino acids side chains (these properties are not 
calculated for nucleobases). Charge, hydropathy and 
hydrophobicity indices (39,40), polarity (41) and mutabil- 
ity (42) can be estimated. 

Charge is calculated as the sum of charges on the side 
chains (at pH ~7) lining the channel. 

Hydropathy (39) is calculated as an average of the 
hydropathy index of lining side chains, where the most 



hydrophilic is Arg (—4.5) and the most hydrophobic is 
He (+4.5). 

Hydrophobicity (40) is calculated as an average of 
normalized hydrophobicity scales, where the most hydro- 
philic residue is Glu (—1.14) and the most hydrophobic 
residue is He (1.81). 

Polarity (41) is calculated as an average of amino acid 
polarity. Polarity values range from zero for non-polar 
amino acids (Ala and Gly), through values of around 
1.5 for polar residues (e.g. Ser 1.67), and finally, to two 
digits values for charged residues (Glu 49.90, Arg 52.00). 

Mutability (42) is calculated as an average of relative 
mutabihty index. Relative mutabihty is high for mutatable 
amino acids, e.g. small polar amino acids (Ser 117, Thr 
107, Asn 104) or small aliphatic amino acids (Ala 100, Val 
98, He 103). On the other hand, the mutabihty is low for 
amino acids that play important structural roles, such as 
aromatic amino acids (Trp 25, Phe 51, Tyr 50) or special 
amino acids (Cys 44, Pro 58, Gly 50). 

Such an approach gives only an approximate value of 
mutabihty, whereas sequence specific analyses can be per- 
formed using the multiple sequence ahgnment tools in 
other programs, e.g. ConSurf (43) and Hotspot Wizard 
(44). \t is worth noting that the estimated physicochemical 
properties should be interpreted with care, as the calcula- 
tion is based on an assumption that the side chains of the 
fining amino acids significantly determine the environment 
within the identified channel. The calculation might be 
sensitive to exact position of the starting and ending 
points. 

Users of MOLEon/me can download all results as a 
report. Channel centerhne positions with radii of max- 
imally inscribed balls values can also be downloaded in 
two formats for further analysis and storage: (i) as a 
generic PDB file or (ii) as a python script for visualization 
in PyMol (http://www.pymol.org). 

RESULTS AND DISCUSSION 

Examples of usage 

Microsomal Cytochrome P450 (CYP) enzymes are im- 
portant for the metabolism of many endogenous 
compounds and xenobiotics (45,46). CYPs share a 
buried active site (47), which is connected to the outside 
environment by various access/egress channels. (15) These 
channels are responsible for substrate passage to and 
product release from the active site, and they are con- 
sidered to be involved in substrate preferences of CYP, 
which has been shown to vary considerably among CYP 
enzymes (12,13). Figure 2 shows all the channels connect- 
ing the active site of a CYP enzyme [calculation started 
from Glu 308 and Thr 309 according to the CSA (37)] of 
CYP3A4 (PDB: ITQN) with the exterior. The top ranked 
channel found by MOLEo«//«e (white in Figure 2) is the 
solvent channel (15). The solvent channel is 17 A long and 
its bottle-neck is 1.41 A wide. The solvent channel is also 
rather hydrophilic as its hydropathy equals —1.9. By com- 
parison, the hydropathy values of channels 2e, 2a and 2f 
are —0.2, —0.3 and 0.4, respectively, which suggests that 
these channels are less hydrophilic. The same trend can be 
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seen in the hydrophobicity index, which again suggests 
that the solvent channel is also more hydrophilic (—0.68) 
than channels 2e, 2a and 2f, with values of 0.1, 0.08 and 
0.1, respectively. These findings are consistent with 
previous data, which have identified the solvent channel 
as the main channel responsible for active site solvation 
(48) and hydrophilic product release (13,49), while 
channels of 2x family are considered to be involved in 
hydrophobic substrate binding (13,50). 

The ribosomal exit tunnel (RET) allows nascent peptide 
chains synthesized at a peptidyl transferase center to exit 
the ribosome (9). Analysis of ribosomal channels repre- 
sents a challenge for software tools like MOLEo«//«e, 
due to the considerable size and complexity of ribosomes 
[approximately 100000 heavy atoms (28)]. Figure 3 shows 
the RET of a large ribosomal unit from Haloarcula 
marismortui (PDB: 1JJ2 containing 90 650 atoms). In 
order to achieve optimal results, the channel search par- 
ameters had to be adjusted. Since the RET is large enough 
for passage of nascent peptide with a channel bottleneck 
radius of ~3 A, the probe radius has to be greater (6 A) to 
capture the channel. The interior threshold also has to be 
increased to avoid additional small channels in the 



structure (2.4 A). In addition, the surface cover radius 
should be enlarged to avoid redundant channels appearing 
(20 A). Two residues of the peptidyl transferase center 
were chosen as the start of the RET (Chain 0: U 2620, 
A 2486). The calculation takes ~35s on the server (CPU 
Intel 15 760 2.8 GHz, 4 GB RAM), while the total time, 
including transfer of data onto client web pages, takes ~1- 
2min. The length of the ribosomal exit channel is ~100A 
with three bottlenecks of minimum radii ~4.5A 
(Figure 3). The RET is highly hydrophilic, polar and 
mostly fined by negatively charged residues (lint from 
23 S rRNA have their negatively charged main chains 
oriented toward the channel and two Glu residues have 
side chains facing the channel); the negative charge is to 
some extent compensated by six Arg residues. The 
distributed charge of the ribosomal polypeptide exit 
channel is important to prevent the nascent peptides 
from becoming 'stuck' inside the ribosome. 

Limitations 

The presented application has four main fimitations. The 
first limitation stems from the initial concept that the 
channels are extrapolated as sets of maximally inscribed 
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Figure 2. Results of channel analysis of Cytochrome P450 3A4 (CYP3A4) using the setup shown in Figure 1. Four channels found from 
user-specified starting point are shown, whereas the automatic detection also found additional 17 tunnels which are not shown for clarity. The 
profile of the tunnel #1 along the centerline and list of lining residues are shown in the external windows (right-hand side). A list of all the unique 
lining residues and the corresponding side chains alone is displayed along with physicochemical properties of the respective channel. Lining residues 
can also be visualized along the channel centerline, with the channel represented by maximally inscribed spheres in the Jmol window. It is also 
possible to show molecular surface and all detected cavities and their volumes. In addition, starting points can be shown as small cubes for original 
user-defined starting point (in magenta), for optimized position of such starting point (in green) and for all automatically detected starting points (in 
yellow). Information about tunnel profiles and lining residues can be further exported in form of report, PDB file or python file for visualization in 
Pymol. 
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Figure 3. Visualization of ribosomal exit tunnel (RET) of a large ribosomal unit from Haloarcula marismortui (PDB: 1JJ2). The figure was prepared 
in Pymol using an exported python file containing positions of all the channels identified by MOLEon/me. Only the RET is shown and ribosomal 
proteins L4 (green), L22 (blue), L39E (red) lining the tunnel are highlighted. The channel profile shows the positions of three bottlenecks. 



balls along the channel centerline. Such an extrapolation 
does not allow complex channels with bulges to be 
mapped accurately. The second limitation arises because 
the channel-finding algorithm is apphed to an 
atom-centered Voronoi mesh. In principle, the additively 
weighted Voronoi graph or power diagram offers some 
benefits in terms of precision, but the gain in precision is 
small compare to the uncertainties associated with the 
chosen structures (e.g. X-ray structures with finite reso- 
lution, which is generally higher than 0.8 A), treatment 
of hydrogen atoms and atomic radii set. The analysis of 
transmembrane pores is also limited (or not so convenient) 
because the transmembrane pores have to merged from 
pore segments identified as tunnels by MOLEo«//«e 2.0. 
The final limitation relates to the software and data 
handling on the server, which limits the maximal size of 
the studied system to around 100 000 atoms (8 MB). 
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CONCLUSIONS 

In this article, we described MOLEoniine 2.0 (http://ncbr 
.muni.cz/mole or http://mole.upol.cz), a new web-based 
interactive tool for the analysis of molecular channels 
and pores. The MOLEoniine interface enables 
platform-independent, easy-to-use and interactive 
analyses and offers the prospect of high automation, e.g. 
by downloading structures from the PDB database and 
employing automatic active site identification based on 
the CSA. The results of the channel search using 
MOLEoniine are presented in a clear visual or data 
form, making their interpretation and further manipula- 
tion easy. 
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